Overlapping computation and communication of three-dimensional FDTD on a GPU cluster

نویسندگان

  • Ki-Hwan Kim
  • Q.-Han Park
چکیده

Large-scale electromagnetic field simulations using the FDTD (finite-difference time-domain) method require the use of GPU (graphics processing unit) clusters. However, the communication overhead caused by slow interconnections becomes a major performance bottleneck. In this paper, as a way to remove the bottleneck,wepropose the ‘kernel-splitmethod’ and the ‘host-buffermethod’which overlap computation and communication for the FDTD simulation on the GPU cluster. The host-buffer method in particular enables overlapping without any modifications to the update-kernels that are already in use. We also present theoretical formulas to predict the overlap threshold and the total throughput for each method. By using our overlap methods with 6 GPU nodes, we demonstrate that the total performance of 3D FDTD reaches 92% of a six-fold increase, which is the upper limit that would be reached if there were no communication overhead. © 2012 Elsevier B.V. All rights reserved.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Novel Scheme for High Performance Finite-Difference Time-Domain (FDTD) Computations Based on GPU

Finite-Difference Time-Domain (FDTD) has been proved to be a very useful computational electromagnetic algorithm. However, the scheme based on traditional general purpose processors can be computationally prohibitive and require thousands of CPU hours, which hinders the large-scale application of FDTD. With rapid progress on GPU hardware capability and its programmability, we propose in this pa...

متن کامل

Computation-Communication Overlap of Linpack on a GPU-Accelerated PC Cluster

In this paper, we propose an approach to obtaining enhanced performance of the Linpack benchmark on a GPU-accelerated PC cluster connected via relatively slow inter-node connections. For one node with a quad-core Intel Xeon W3520 processor and a NVIDIA Tesla C1060 GPU card, we implement a CPU–GPU parallel double-precision general matrix–matrix multiplication (dgemm) operation, and achieve a per...

متن کامل

Multi-GPU-based Swendsen-Wang multi-cluster algorithm with reduced data traffic

The computational performance of multi-GPU applications can be degraded by the data communication between each GPU. To realize high-speed computation with multiple GPUs, we should minimize the cost of this data communication. In this paper, I propose a multiple GPU computing method for the Swendsen–Wang (SW) multi-cluster algorithm that reduces the data traffic between each GPU. I realize this ...

متن کامل

کاربرد روش معادله سهموی در تحلیل مسائل انتشار امواج داخل ساختمان

With the rapid growth of indoor wireless communication systems, the need to accurately model radio wave propagation inside the building environments has increased. Many site-specific methods have been proposed for modeling indoor radio channels. Among these methods, the ray tracing algorithm and the finite-difference time domain (FDTD) method are the most popular ones. The ray tracing approach ...

متن کامل

Scalable lattice Boltzmann solvers for CUDA GPU clusters

The lattice Boltzmann method (LBM) is an innovative and promising approach in computational fluid dynamics. From an algorithmic standpoint it reduces to a regular data parallel procedure and is therefore well-suited to high performance computations. Numerous works report efficient implementations of the LBM for the GPU, but very few mention multi-GPU versions and even fewer GPU cluster implemen...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Computer Physics Communications

دوره 183  شماره 

صفحات  -

تاریخ انتشار 2012